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[57] ABSTRACT 

A DMA controller including an XOR FIFO buffer and XOR 
circuitry for computation of parity. The DMA controller 
resides within a RAID controller and establishes a direct 
data connection from host memory to subsystem local 
memory in order allow the CPU to perform other functions. 
The DMA controller accesses data segments from host 
memory corresponding to blocks of data within a disk stripe. 
As the data is transferred from host memory to subsystem 
local memory, the XOR circuitry simultaneously computes 
the parity corresponding to the successive data segments. 
Computing parity substantially simultaneously with the 
DMA data transfer reduces memory bandwidth utilization 
on the memory bus of the RAID controller. The parity is 
stored in the XOR buffer. Once parity is computed for a 
portion of data segments corresponding to a data stripe, the 
parity is transferred to local memory for retention. These 
steps are repeated until the full stripe is read into local 
memory and a parity value is computed for the entire data 
stripe. Once the RAID controller is ready to post the data to 
disk, the data is transferred from local memory to disk. The 
DMA controller of the present invention may also be advan- 
tageously applied when performing partial stripe writes by 
reducing the memory bandwidth utilization required to 
compute partial parity values. 

16 Claims, 3 Drawing Sheets 
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DMA CONTROLLER OF A RAID STORAGE the new data and the new parity data block to the proper data 

CONTROLLER WITH INTEGRATED XOR locations in the stripe. 

PARITY COMPUTATION CAPABILITY If all the blocks in a stripe are available in the local 

ADAPTED TO COMPUTE PARITY IN memory or provided in the write request, then a Full Stripe 

PARALLEL WITH THE TRANSFER OF DATA 5 Write is possible. In a Full Stripe Write, the parity compu- 

SEGMENTS tation is a XOR of all the data blocks within a stripe. The 

Full Stripe Write avoids the need to use old parity data 

BACKGROUND OF THE INVENTION durin S ^ new P aril y computation. Full Stripe Write 

improves I/O performance because a memory access is not 

1. Field of the Invention 1Q required to read the old parity data from disk and to place a 
This invention relates to data storage subsystems, and C0 P V of tne old P aritv m local memory. 

more particularly, to a DMA controller with integrated XOR It is known to use a DMA circuit in a RAID controller to 

parity computation capability adapted to compute parity in transfer data from a source to a destination. Exemplary of 

parallel with the transfer of data segments. such a DMA transfer is the exchange of data between a host 

_ f _ t A . . system memory and the RAID controller local memory 

2. Discussion of Related Art 15 ^ Qr Qther buffcrs) . Arcqucst is made to lhc DMA 

Redundant Arrays of Inexpensive Disks (RAID) systems circuit to perform a data transfer. The DMA controller 

are disk array storage systems designed to provide large establishes a direct data path between the host RAM and the 

amounts of data storage capacity, data redundancy for local memory (e.g., cache). Thus, the DM A allows the RAID 

reliability, and fast access to stored data. RAID provides data controller central processing unit (CPU) to perform other 

redundancy to recover data from a failed disk drive and 20 tasks while the data exchange occurs in parallel. In the case 

thereby improve reliability of the array. Although the disk of a write operation from the host to the RAID subsystem, 

array includes a plurality of disks, to the user the disk array the RAID controller CPU reads the data from local memory 

is mapped by RAID management techniques within the and computes required parity as noted above. The disk drive 

storage subsystem to appear as one large, fast, reliable disk. controller is programmed to transfer the data and new parity 

-ru i A-t* * *u a ♦ • i i d Am 25 from the RAID subsystem local memory to the disk array. 

There are several different methods to implement RAID. , . „ r \ , r.- 

nAir>1 . , AAi t r a ... The local memory is therefore accessed a number of times 

RAID level 1 mirrors the stored data on two or more disks r . , 3 . . t . „. . .« . . 

. , P A t n*u^ for each such complete write operation. First, the local 

to assure reliable recovery of the data^ Other common m fc ^.^ ^ ^ ^ £ ansferred from the host 

implementations of RAID, levels 3, 4, and 5 distribute data ^ same ^ fa read ^ ^ ufc ^ ^ 
across the disks in the array and provide for a block (or 3o data? and finally the data fc read again to ^ the data and 
multiple blocks) of redundancy information (e.g., parity) ass0 ciated parity to the disk array. Each of these local 
that is also distributed over the disk drives. On each disk, memory accesses utilizes valuable memory bandwidth in the 
data is mapped and stored in predefined blocks generally raid controller. It is desirable to reduce the utilization of 
having fixed size. Apredefined number of blocks of data and lhe local mem ory bandwidth utilized for each write opera- 
redundancy information (e.g., parity), from each disk of the ^ ^on so as to improve the overall I/O performance of the 
array, are mapped to define a stripe of data. One common RAID subsystem. 

type of stripe, the parallel stripe, provides load balancing Some prk)I techniqilcs and devices have integrated parity 

across the disks in the array by defining the stripe as parallel computation circuits with the DMA controller to simplify or 

blocks of data across the disk array. speed me cornputation 0 f X OR parity data. Such known 

In RAID level 3, and 4, the redundant information, that is 4Q techniques tend to integrate the XOR computation with the 

parity information, is stored in a dedicated parity disk. In a DMA controller such that the computations is performed at 

RAID level 5 implementation, the parity information is the "back-end" of the RAID controller data transfers. In 

interleaved across all the disks in the array as a part of the omer words, the DMA controller performs the XOR parity 

stripe. computation as the data is transferred from the RAID 

RAID levels 3, 4, and 5 suffer I/O performance degrada- 45 controller local memory to the disk array. In such methods, 

tion due to the number of additional read and write opera- the DMA controller reads the stripes of data to be written 

tions required in data redundancy algorithms. RAID con- from RAID subsystem local memory and simultaneously 

trollers often include local memory subsystems (e.g. cache) computes the parity of the stripe as it transfers data to the 

used to temporarily store data and parity involved in a host disk array. 

I/O operation and thereby mitigate the performance degra- 50 Back-end parity computations generally require that the 
dation of the redundancy techniques. disk drives be operable in a synchronized manner such that 
There are two common write methods implemented to the parity computation and DMA transfer operate in "lock- 
write new data and associated new parity to the disk array. step" among a plurality of disk drive transfer operations. 
The two methods are the Full Stripe Write method and the Parity is computed using related portions (segments) of the 
Read-Modify -Write method also known as a partial stripe 55 stripe. The XOR computation circuits must therefore receive 
write method. If a write request indicates that only a portion the proper sequence of related bytes in related segments to 
of the data blocks in any stripe are to be updated then the compute a correct XOR parity segment for the related 
Read-Modify-Write method is generally used to write the segments. 

new data and to update the parity block of the associated Such "lock-step" operation is used in older technology 

stripe. The Read-Modify-Write method involves the steps 60 disk drives such as integrated drive electronics (IDE) inter- 

of: 1) reading into local memory old data from the stripe face devices because the RAID controller is more directly 

corresponding to the blocks to be updated by operation of controlling the data transfer. IDE drives run single threaded 

the write request, 2) reading into local memory the old parity in that each data transfer requires a handshake. Each transfer 

data for the stripe, 3) performing an appropriate redundancy of data (e.g., byte or 16-bit word) requires a request to the 

computation (e.g., a bit-wise Exclusive- Or (XOR) operation 65 RAID controller and acknowledgment of the data delivery 

to generate parity) using the old data, old parity data, and the by the disk drive controller before the next unit of data is 

new data, to generate a new parity data block, and 4) writing transferred. 
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To accommodate this precision timed lock-step approach, segmenting source data, as defined by a scatter/gather list, is 

a high speed static RAM (SRAM) buffer is commonly used applicable in a contiguous or non-contiguous memory. A 

in conjunction with the DMA transfer to assure readiness of gather list is a data structure Unking blocks of data in a 

the data when the DMA is requested to transfer the next unit predetermined order for the purpose of transferring the 

of data to the disk drives. Not only is such an additional 5 blocks of data from particular source addresses which may 

SRAM buffer somewhat costly, but it requires that the local or may not be contiguous. A scatter list is a data structure 

memory data be read once again to transfer the data block linking blocks of data in a predetermined order for the 

from the lower speed local memory to the high speed SRAM purpose of transferring the blocks of data to particular 

transfer buffer. destination addresses which may or may not be contiguous. 

Such back-end DMA/parity computations are not well 10 For example, to transfer the data segments in a predeter- 

suited to today's RAID systems that utilize disk drive mined order from host memory to the local memory, the 

devices having substantial buffering and intelligence within gather list stored within the disk array controller contains, in 

the drive device, for example a SCSI disk drive. The use of a specific order, a list of host addresses translated to a series 

the SCSI drive device allows the SCSI controller to control of internal bus addresses. The scatter list stored within the 

the data transfer. The SCSI controller takes control of the bus * 5 disk array controller determines the destination address of 

and issues commands to transfer data from local memory the data segments in local memory, 

(e.g. cache), rather than the CPU utilizing the DMA to me data segments are transferred, in the sequence 

transfer data to the disk drive. Higher performance SCSI defined by the scatter/gather lists, circuits in the DMA 

disk drives typically contain significant buffering and com- controller of the present invention monitor (or "snoop") the 

putational intelligence to optimally order a plurality of 20 disk controller internal bus to capture the data as it is 

commands queued within the drive itself (in a buffer local to transferred. The captured data is then used in XOR parity 

the drive). For example, some SCSI disk drives have the computations as described herein. Address ranges defined in 

computational intelligence for command queuing and eleva- the controller are used to determine whether addresses on 

tor sorting. Such optimizations are often key to achieving the the disk controller internal bus correspond to a valid data 

specified performance levels of the disk drives. SCSI con- 25 segment of the stripe. The circuitry performing the XOR 

trollers optimize performance by sorting I/O requests before parity computation uses the programmed address ranges to 

saving data or before retrieving data. Therefore, the order the determine the data to "snoop" or read as it is transferred to 

I/O request was received does not matter because the SCSI i oca i memory. The XOR parity circuitry computes the parity 

controller will sort the I/O request to optimize data retrieval of the "snooped" data segments as they are transferred to the 

and data storage to disk. 30 destination. 

These optimization features are defeated by these lock- In particular, the present invention is applicable torR'AID 

step sequences as required by the known back-end DMA/ controUers-tha ^attach'dire cUy-to~me±ost.system-s-mainj^us 

Parity techniques. In these cases, the substantial buffering (e:g^a=PGFbus). The DMATcontrol lerof the^RAID'c on- 

within the drive device is not effectively utilized because the ftrollerlthe7elore-com ^ 

parity computation may be corrupted if the related segments fers m^lxfrjomltriehost system's mam memory-onjhe PCI 

are not transferred in the proper sequence. For example, one bus Wthe-R AID~controllerioca ]^m^^ the 

of the plurality of SCSI disk drives relating to a particular CDMA controUer-of-the-present-mventiorTenables the RAID 

stripe may determine for any of several reasons that the subsystem controller to read data from the host at the 

buffer cannot handle further data at this time or a SCSI drive subsystem's direction. 

may chose to resequence operations in its buffer to optimize p rcsen t invention allows the RAID subsystem to 
drive operations. Such a determination by one drive may control the ordering and the size of the DMA data transfer, 
require logic to stop the DMA/Parity operations to all drives Previously, the host directed the transfer of a contiguous 
so as to assure proper sequencing of the stripe data through block of data to the subsystem local memory, such as cache, 
the XOR circuits. Such additional logic to assure lock-step ^ ^ me j^jq subsystem later performed the parity genera- 
sequencing of all drives in a stripe serves to defeat the uon -phe present invention allows the RAID subsystem to 
intelligence and buffering of high speed drives thereby direct the transfer of a segmented block of data and perform 
negatively impacting overall subsystem performance. ^ ear j y parity computation, as the data is stored in sub- 
It is evident from the above discussion that a need exists system local memory, thus allowing the subsystem to elimi- 
for enhanced DMA/Parity circuits which overlap parity 5Q n ate one read operation to local memory, 
computation with data transfer while reducing bandwidth The present invention is best suited to, though not exclu- 
requirements for local memory without substantially sively suited to, RAID controllers that make use of the Full 
increasing hardware costs. Stripe Write method. The early parity computations reduce 

„ „ „ nmimmi the number of read operations from local memory or disk 

SUMMARY OF THE INVENTION & m&y h ^ aU ^ ^ jn ^ stripe js ^ t0 compme 

The present invention solves the above and other parity as it is transferred from the host system. The parity 

problems, thereby advancing the useful arts, by providing a computation for the full stripe is therefore completed in 

DMA controller in a RAID controller which performs XOR parallel with the transfer of the stripe from the host memory 

parity computation substantially in parallel (simultaneous) to the RAID local memory. Therefore, the RAID subsystem 

with the transfer of data at the "front-end" of the data 6Q does not need to access any further data from disk or local 

exchange operations. By performing the parity computation memory to compute parity. After computing the parity 

in parallel with the front-end data transfer (from the host to corresponding to a portion of a data stripe, the resultant 

the RAID controller local memory), the need for lock-step parity data within the DMA circuit of the present invention 

synchronization with the disk drives is obviated. is stored in local memory until the RAID controller is ready 

The present invention transfers a segmented block of data 65 to post the computed parity data to disk, 

in a predetermined order from the source memory in order The present invention interfaces with the host's PCI bus 
to perform the early parity computation. The process of using a PCI bus bridge as the front-end interface. A person 
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skilled in the art will recognize that the present invention 
may be applied to many commercially known bus structures. 
The PCI bus is but one example of a presently available, 
commercially popular bus for peripheral device intercon- 
nection in host systems as well as for busses within intelli- 
gent peripheral devices such as a RAID controller. 

The exemplary use of the present invention, as presented 
herein, is in a RAID 5 disk array subsystem. A person skilled 
in the art will recognize the present invention is operable in 
other types of RAID disk array arrangements. 

As used herein, a stripe is comprised of a plurality of 
blocks, one on each of a plurality data disks and one on a 
parity disk. The block on the parity disk is computed as the 
bitwise Exclusive-OR (XOR) of the corresponding blocks 
on the data disks of the stripe. Specifically, the XOR of each 
first bit of each block on each data disk generates the first bit 
of the parity block. The XOR of the second bit of each data 
block generates the second bit of the parity block, etc. 
Naturally, the computations are performed in more conve- 
nient units such as 8, 16, or 32 bit words. 

As stored in host memory, the data blocks of the stripe are 
generally sequential in contiguous memory. Prior techniques 
have generally transferred data from such a host memory to 
local memory in the same sequential order. The present 
invention, by contrast, transfers such data in a specific 
non-sequential manner to perform XOR parity computations 
in parallel with the DMA transfer while minimizing the 
intermediate buffer space required to do so. 

In accordance with the present invention, the RAID 
controller uses a DMA controller coupled to an XOR buffer. 
The XOR buffer comprises logic circuits for capturing the 
data as the DMA controller transfers from the host memory 
to the local memory and computing XOR parity therefrom 
and further comprises a FIFO buffer for accumulating, that 
is, storing the XOR computation intermediate results. As the 
DMA controller transfers a plurality of data segments from 
host RAM to local memory, such as cache or nonvolatile 
memory, the XOR buffer substantially simultaneously com- 
putes the parity of the data segment and stores the XOR 
parity result in the FIFO of the XOR buffer. The FIFO of the 
XOR buffer of the present invention preferably stores 512 
bytes preferably arranged as 128 32-bit wide entries. 

More specifically, in response to a write request for a 
stripe, the DMA transfers the first data segment of the first 
block of the stripe from the host system memory to local 
memory. As this first data segment is written to local 
memory, the XOR buffer "snoops/' that is the XOR buffer 
reads the first data segment from the RAID subsystem 
internal bus as the DMA transfers the data. The snooped data 
is stored in the FIFO of the XOR buffer as it is snooped from 
the internal bus. A second data segment corresponding to the 
first segment of the second block of data from the data stripe 
is then transferred by the DMA from the host system 
memory to local memory. The XOR buffer snoops the 
second data segment from the RAID subsystem internal bus 
as the second data segment is copied into local memory. The 
first data segment (stored in the FIFO of the XOR buffer) 
and the second data segment are XOR'd as the DMA 
transfers the data to obtain an intermediate parity result. A 
third data segment corresponding to the first segment of the 
third block of data from the data stripe is transferred from 
host memory to local memory. The XOR buffer snoops the 
third data segment from the RAID subsystem internal bus 
and performs the XOR of the previous intermediate parity 
result and the third data segment which results in an updated 
intermediate parity. This process continues for all first 
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segments of remaining blocks of the stripe until a final parity 
segment is generated. This final parity computation repre- 
sents the parity segment of the first segments of all blocks of 
the data stripe. The DMA transfers this parity segment final 

5 result to local memory for further processing. 

The present invention repeats this process for the second 
segments of all blocks, the third segments, etc. until all data 
segments within host memory corresponding to a stripe are 
transferred to local memory and the parity is computed for 

10 the corresponding data segments. The final parity segment 
result for each segment of the stripe is transferred to local 
memory. The XOR buffer is reinitialized when computing 
parity for a new set of segments. Later when the RAID 
controller is ready to post the data to disk, the disk controller 

15 is configured to retrieve data from the local memory along 
with corresponding parity already computed and stored in 
local memory and write it to disk. 

Though the present invention is best suited to such full 
stripe write operations, it may also be applied to partial strip 

20 write operations. If all the data blocks corresponding to the 
disk stripe are not present in the source, a Read-Modify- 
Write operation is often used by the RAID controller in 
response to a write request. The present invention, though 
best suited to full-strip-writes, is none the less operable to 

25 reduce the number of read operations required. When a 
partial write operation is executed, the RAID subsystem 
transfers data or parity (depending on the RAID level and 
the number of blocks not present in host memory) from disk 
to the local memory. An XOR buffer coupled to the DMA 

30 controller is loaded with data from local memory which is 
then XOR'd with the new data transferred via the DMA 
controller from the host memory. 

In the case of a single block being updated, the RAID 

35 controller may be configured to transfer the old data and old 
parity for that block from disk to local memory. The XOR 
buffer coupled to the DMA controller computes the XOR of 
the old data and old parity. This intermediate parity is loaded 
segment by segment into the XOR buffer as each segment of 

4Q new data is transferred from the host system. After each 
segment is transferred, the XOR buffer contents, which 
contain the new parity, are stored in local memory. In this 
manner, the invention reduces required bandwidth of the 
local memory since new data is not retrieved from local 

45 memory to compute parity. 

The present invention is also useful when the RAID disk 
array is in degraded mode to reconstruct the lost data from 
the failed disk. The DMA controller reads data segments, 
from local memory or the disk array, corresponding to the 

50 appropriate stripe the lost data was located on. The XOR 
buffer stores the first data segment transferred and XOR's 
subsequent transfers with the previous buffer content. After 
all the appropriate data segments are transferred to the host, 
parity information is transferred to the XOR buffer. The 

55 XOR buffer computes the XOR of the buffer contents and 
the parity. The result is the reconstructed data and the DMA 
controller transfers the reconstructed data to the host system. 

The present invention permits use of high performance 
SCSI busses to connect the RAID subsystem controller and 

60 the disk array. SCSI busses allow the disk array subsystem, 
e.g., the disk drives, to control the ordering as the data is 
transferred to the disk array. The SCSI protocol allows the 
SCSI peripheral device to determine when data is trans- 
ferred. In response to drive requests, the DMA in the SCSI 

65 controller takes control of the internal bus in the RAID 
controller and accesses local memory within the RAID 
controller. Since the exact time when data is transferred is 
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not known by the interface controller, SCSI interfaces are interface 30 bus due to faster transfer rates as distinguished 

best suited to block transfers and data buffering at both the from older backplane busses such as ISA or EISA. The host 

peripheral and controller. system interface bus 30 can be selected based on system 

The present invention may be used with any disk interface performance requirements. The RAID subsystem comprises 

because the parity computation is performed in parallel with * the RAID controller 40 and the disk array 100. 

the data transfer at the front-end (host to local memory) The front-end interface 50 is a PCI Bus bridge. The PCI 

rather than the back-end (local memory to disk). The present Bus Bridge interfaces the host's PCI bus 30 to the RAID 

invention enables the use of more intelligent interfaces controller's internal bus 51. The RAID controller's internal 

though it can function with low-end IDE interfaces. A person bus 51 is designed to support high data transfer rates. A 

skilled in the art will recognize the present invention is 10 person skilled in the art will recognize that the RAID 

operable for drives connected using an EIDE bus, Fibre controller of the present invention is also operable under 

Channel SCSI, Fast SCSI, or Wide SCSI or other well other bus architectures such as ISA, EISA, MicroChannel, or 

known interface media and protocols which provide for PCI. 

intelligent control by the peripheral devices attached to the The RAID controller also includes CPU 60, local memory 
RAID controller. 15 80, a DMA controller 71, and a disk drive interface con- 
It is therefore an object of the present invention to provide trailer 90. The local memory 80 (also referred to herein as 
a DMA controller for transferring a plurality of data seg- cache or cache memory) is preferably implemented using 
ments from a source memory to a destination memory and low cost dynamic RAM (DRAM) chips. Since the tech- 
computing the parity of the plurality of data segments in niques of the present invention do not require lockstep 
parallel with the transfer. 20 timing as do prior back-end DMA/Parity techniques, lower 
It is another object of the present invention to provide a cost, lower performance DRAMS may be used for local 
DMA controller with XOR capability for reducing local memory 80. The DMA controller 71 includes an XOR buffer 
memory accesses and thereby improving I/O performance 72 to compute parity. 

by computing parity in parallel with the transfer of data from „ FIG. 2 depicts how the DMA controller 71 in conjunction 

a source to a destination. with the XOR buffer 72 accesses and computes the parity of 

It is yet another object of the present invention to provide data segments as they are transferred from the host memory 

a DMA controller for reconstructing data while the storage system to i°^ mor y^2? e *° R buff ?2? " 

subsystem is in degraded mode by computing parity in dual ported 512 byte FIFO 73, address unit 75, and XOR 

parallel with the transfer of data and parity from a source to 30 umt 74 * 

a destinatioa Address unit 75 contains three sets of address registers 

It is a further object of the present invention to provide a J»t store the address ^corresponding to *e ^»tion to jhe 

DMA controller for reducing storage subsystem costs by « an internal address later traiKlated to the destaa Uon 

eliminating the need for lock-step transfer of data while ™ e f <f ^f 5 

*• * ■ m„nu ♦L^n^rnf^hfrnmi i< range that is used to select data segments on internal bus 51 
compuhng parity in parallel with the transfer of data from a 35 ^ ^ ^ ^ fcU ^ ^ 

source to a destination. range wfaen a d ^ segment k selected> ^ 9 i east 

The above and other objects, aspects, features, and advan- significaiU bits of the address are used to select a FIFO 
tages of the present invention will become apparent from the location> data at this mF0 i oca tion is XOR'd with the data 
following description and the attached drawings. ^ Qn ^ imernal bus 51 and is back t0 the FIF0 at lhe 

BRIEF DESCRIPTION OF THE DRAWINGS samc location. A second set of address registers define an 

address range so that when an address on the internal bus 51 

FTC 1 is a block diagram of the RAID subsystem in ^ s w ithin the specified range, the XOR buffer reads the 

which the DMA controller of the present invention is data ^ om ^ internal bus. A third set of address registers 

advantageously applicable. 45 define a third range so that when an address on the internal 

FIG. 2 is a block diagram showing the circuitry within the bus 51 falls within the specified range, data from the internal 

XOR buffer used to read, store, and compute parity as data bus is XOR'd with the corresponding location of the FIFO 

segments are transferred from a source to a destination. buffer 73 and the result is output onto the internal bus 51 via 

FIG. 3 is a block diagram depicting how the DMA multiplexor 53. Multiplexor 52 blocks the transfer of data 

controller accesses and reads data from a memory system. 50 from the internal bus to the front-end interface 50. A control 

input allows the FIFO 70 to be reset to zero. 

DETAILED DESCRIPTION OF THE ^ ^ X0R buffer in resp onse to the programmed 

PREFERRED EMBODIMENTS address ranges is enabled to perform one of four functions 

While the invention is susceptible to various modifica- as data is transferred on the internal bus. First, the XOR 
tions and alternative forms, a specific embodiment thereof 55 buffer can do nothing if the address of the data segment is 
has been shown by way of example in the drawings and will outside of the programmed address ranges. Second, the 
herein be described in detail. It should be understood, XOR buffer can store the data to the corresponding buffer 
however, that it is not intended to limit the invention to the location by using the 9 least significant bits of the address to 
particular form disclosed, but on the contrary, the invention select a FIFO location. Third, the XOR circuitry will per- 
is to cover all modifications, equivalents, and alternatives 60 form the XOR of the buffer contents and data, and write back 
falling within the spirit and scope of the invention as defined the results to the same buffer location. Fourth, the XOR 
by the appended claims. buffer will output the results. A person skilled in the art will 

FIG. 1 is a block diagram of the RAID subsystem in recognize the XOR buffer functions can be selectively 

which the DMA controller of the present invention is enabled or disabled by the DMA controller 71 and CPU 60. 

advantageously applicable. The host 10 comprising a CPU 65 As DMA controller 71 transfers data from the host 10 to 

and host memory 20 is connected via the system interface the local memory 80, it is also transferred to the XOR buffer 

bus 30. A PCI bus is commonly used as the host system 72. During the DMA transfer, a series of host address may 
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be translated to a series of internal bus addresses. The XOR RAID controller is ready to post the data to disk, the disk 

buffer 72 uses address ranges to determine whether controller 90 is configured to retrieve data from the local 

addresses on the internal bus 51 correspond to a valid data memory 80 and write it to the disk array 100. 

segment of the stripe. Thus, the XOR buffer 72 uses the Though the above discussed embodiment represents the 

programmed address ranges to determine the data to s be$t presently ^own mode of practicing the present 

"snoop or read as it is transferred to local memory 80. For those skmed ^ equivalem 

example, the range of addresses can be the size of a single cmbodimcnts of ^ t invcntion whcrcin ^ first daU 

segment and be updated after each segment is transferred. . . . * . . _ ... 

ttT vnn u or tVu a * ( k 0 c „ a segment may be handled in other manners. For example, the 

The XOR buffer 72 has a data capacity smaller than the size » . J e e r • 

f a- \r * - r ,l nr „c„ rra A ^Lri.^.nf c * nnnt . FIFO 73 is reset to zero values before the transfer of a 

of a disk stnpe. In the preferred embodiment, the storage ft . - , . 

r*i_ v/-vTi l • niu » a ' i -ii j 10 segment in the first block. Each data segment transferred is 

capacity of the XOR buffer 72 is 5 12 bytes. A person skilled * . t v ^n,, . . « w i \ 

• *u -* -ii • *u nAin * ™* then simply XOR d with the present (accumulating) parity 

in the art will recognize the RAID controller of the present . %* r , TT7r . „ ,, , 4 , • 

,5 p «u ^ * •« f *u vnD values of the FIFO 73. This allows all data segments in a 

invention is operable even if the data capacity of the XOR , t . , t , 4 . „ ..^ 

buffer 72 is decreased or increased sequence of blocks to be treated identically with respect to 

bufler 72 is decreased or increased. XOR parity generation within the DMA controller 71. Such 

The ^following example shows how the present invention 15 desi F choices in drcuit md cni desi are weU laawa t0 

reads data from the internal bus and computes the parity of ^ . q ^ ^ g induding power 

portions .of a stripe. In response! to a wnte request, meXOR dissipation layom complexity, library component 

buffer 72 is reset to all zeros. Hie address range of the data cta ' arc & choosin g arn0Dg such 

is wntten within the address unit 75. The addresses corre- equivalent designs 

spond to the local memory system address (which is the « 0 4 ' 

destination for the data), or an internal address which is later If a11 the data blocks corresponding to the disk stnpe are 
translated to the local memory address). The DMA control- not P rcsent m host memor y or local memory, a Read- 
ier 71 transfers a first data segment corresponding to a block Modify-Write operation is required in response to a write 
of data from the first data stripe from the host system In \ first embodiment, the data segments corre - 
memory 20 to local memory 80. As the first data segment is 25 spending to the incomplete stnpe and 'emainingin host 
written to local memory 80, the XOR buffer 72 "snoops," memor y 20 are writte * l ° the X0R buffer 72. The data 
that is the XOR buffer 72 reads and stores the data segment se S ment not present in the host memory is read from the disk 
from the RAID subsystem internal bus 51. The XOR buffer ™* * «f*™* * loca memor y f > ^* ™ n . lu T 
72"snoops"thefirstsegmentbecausethedataaddressofthe X0R buffer 72 « ™ e data se fi ments are X0R d ™* the 
first data segment is within the range of addresses the XOR 30 P_ revious intermediate parity computations corresponding to 
buffer 70 is programmed to read. ^complete stnpe and the result is the new panty for the 

A second data segment corresponding to a second block stn P e wtuch 1S later stored m local memory 80. 

of data from the first data stripe is transferred from the host In a second embodiment, only one segment or portion of 

system memory 20 to local memory 80. As the XOR buffer * segment is not present in the host memory to complete the 

72 "snoops- the second data segment from the RAID 35 stripe, and the segment is not available in local memory but 

subsystem internal bus, the first data segment and the second old data and old panty that can reconstruct the segment is 

data segment are substantially simultaneously XOR'd to »y»W* * t^fernng the old daU and 

obtain an intermediate parity result as the data segments are old panty to the XOR buffer 72, the XOR buffer computes 

transferred to local memory. A third data segment corre- the parity of the old data and old parity. The DMA controller 

spending to a third block of data from the first data stripe is 40 71 rcads ^ remainill S data ^ oai i bom tne hos l _ t s ^ m 

transferred from host system memory 20 to local memory memory 20 and the XOR buffer 72 computes the XOR 

80. Similarly, the XOR buffer "snoops" the third data between the remaining data segments and the previous 

segment from the RAID subsystem internal bus 51 and computed parity result. Vxc resulting parity for the stripe is 

substantially simultaneously performs the XOR of the inter- later stored m local memory 80. 

mediate parity result and the third data segment which 45 In the preferred embodiment, the disk array bus 91 in the 

results in an updated intermediate parity as the third data present invention is a SCSI bus. The disk drive interface 

segment is transferred to local memory. A fourth data controller 90 is a SCSI controller and interfaces the RAID 

segment corresponding to a fourth block of data from the controller 40 to the disk array 100. Aperson skilled in the art 

first data stripe is transferred from host system memory 20 will recognize the present invention is operable for disk 

to local memory 80. The XOR buffer "snoops" the fourth 50 drives connected using an EIDE bus, Fibre Channel SCSI, 

data segment from the RAID subsystem internal bus and Fast SCSI, or Wide SCSI or other well known interface 

substantially simultaneously performs the XOR of the media and protocols which provide for intelligent control by 

updated intermediate parity and the fourth data segment as the peripheral devices attached to the RAID controller, 

the fourth data segment is transferred to local memory. An exemplary RAID level 5 disk array contains 5 disks 

The resulting parity computation represents the parity of 55 with a block size on each disk mapped as 4k bytes. Each disk 

the first set of segments (since a block of data is typically in the array contains a plurality of 4k byte blocks. A parallel 

comprised of multiple segments). Subsequendy, the DMA stripe in this example contains 16k bytes of data and 4k 

controller transfers the buffer contents to local cache bytes block of parity. The 16k byte of data is divided into one 

memory. 4k byte block of data on each of the four disks in the array. 

The present invention repeats this process until all data 60 The parity resides in a fifth disk. The present invention, 

segments within host memory 20 conesponding to blocks however, is operable using any other RAID level disk anay, 

within a stripe are transfened to local memory and the parity and block size with corresponding stripe size, 

is computed for the conesponding blocks within the data FIG. 3 is a block diagram depicting how the DMA 

stripe. The final parity result is transfened to local memory controller accesses and segments data from an exemplary 

after the parity for corresponding blocks within the stripe is 65 contiguous host memory system 20. When the host data 

computed. The XOR buffer 72 is reinitialized after comput- physically resides as a contiguous block, FIG. 3 depicts the 

ing parity for each set of data segments. Later when the segmentation process that can be implemented in software 
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or hardware as later describe in pseudo code. A person 
skilled in the art will recognize the segmentation process is 
also operable to read data from local memory when the data 
is non-contiguous and is described as a plurality of elements 
in a scatter/gather list. 5 

In FIG. 3, a file in host memory is segmented into a 
plurality of 512 bytes segments. If a 16k byte file is stored 
in the host, this results in 16k bytes of data and 4k byte 
parity, which in total results in 20k bytes. Bytes 1 to 4096, 
or Al to HI, represent the first eight 512 byte segments that 10 
correspond to the first 4k byte block that will be stored in 
disk drive 110. Similarly, bytes 4097 to 8192, or A2 to H2, 
represent the first eight 512 byte segments that correspond to 
the first 4k byte block that will be stored in disk drive 120. 
Bytes 8193 to 12288, or A3 to H3, represent the first eight 15 
512 byte segments that correspond to the first 4k byte block 
that will be stored in disk drive 130. Bytes 12289 to 16384, 
or A4 to H4, represent the first eight 512 byte segments that 
correspond to the first 4k byte block that will be stored in 
disk drive 140. Bytes PA to PH represent the first 4k byte 20 
block that will be stored in disk drive 150 and bytes PA to 
PH contain parity information. 

Segment Al represents the first 512 byte segment of the 
first block of data in drive 110. Similarly segments A2, A3, 
and A4 represents the first 512 byte segment of the first ^ 
block of data in drive 120, 130 and 140 respectively. The 
XOR, or parity of Al, A2, A3, and A4 represents PA(i.e. PA 
is the XOR of the first 512 byte segments from the first stripe 
within host memory). Segments Bl to B4 represents the 
second 512 byte segments of the first block of data in drive 30 
120, 130 and 140 respectively. The parity of Bl, B2, B3, and 
B4 represents PB. To perform the necessary parity compu- 
tation PA to PH, the DMA is programmed to read and group 
the appropriate noncontiguous 512 byte segments from areas 
in host memory 20, which are separated by 4k bytes. 35 

For example, the DMA reads and transfers the non- 
contiguous segments Al, A2, A3, and A4 sequentially. A 
person skilled in the art will recognize the amount of 
separation between segments depends on the block size, 
which in the present invention is 4k bytes, and that this 40 
invention is operable for block sizes larger or smaller than 
4k bytes. Thus, in response to a write request, the DMA 
controller 71 outputs the grouped non-contiguous starting 
and ending addresses of all the data segments corresponding 
to a stripe of data, and bus control signals so that the 45 
destination can access the data directly without the inter- 
vention of the CPU. 

For exemplary purposes, as 512 byte segment of data 
from host memory, segment Al, is transferred to local 
memory 80, the XOR buffer 72 within the DMA controller 50 
71 "snoops" Al from internal bus 51. A2 is separated by 4k 
bytes from Al and is the next 512 bytes of data "snooped" 
by XOR buffer 72 on internal bus 51 as A2 is transferred 
from host memory 20 to local memory 80. As A2 is 
transferred, the bitwise XOR circuitry 74 within the XOR 55 
buffer 72 simultaneously computes the XOR, that is parity 
of data segments Al and A2. The resulting intermediate 
parity value is stored within the XOR buffer 72. As the DMA 
controller 71 transfers the next 512 bytes of data, A3, the 
XOR buffer "snoops" A3 from internal bus 51 and XOR 60 
circuitry 74 within XOR buffer 72 simultaneously computes 
the parity of the previously stored parity and data segment 
A3. The resulting parity overwrites the previously stored 
parity in the XOR buffer 72. This process is repeated until 
the parity segment, PA, is computed. PA is the XOR of Al 65 
and A2 and A3 and A4. After PA is computed, the DMA 
controller 71 transfers PA to local memory 80. 
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Using the corresponding data segments, this process is 
repeated to obtain the parity of the B segments, PB, the C 
segments, PC, the D segments, PD, the E segments, PE, the 
F segments, PF, the G segments, PG, and the H segments, 
PH. The resulting parity computations PB, PC, PD, PE, PF, 
PG, and PH are transferred by the DMA controller 71 to 
local memory 80. After data and parity are stored in local 
memory 80, the disk drive interface 90 may be configured to 
transfer the data from local memory to disk array 100. 

The present invention uses the following addressing 
scheme to determine the address to access each data seg- 
ment. This addressing scheme assists the DMA in transfer- 
ring and placing in suitable order the data segments for the 
XOR parity computations. This addressing scheme may be 
expressed in pseudo code as: 



for (stripe=0; stripe<stripe_total;stripe=stripe-«-l) 

for (cur_seg=0;cur_seg<num__seg; cur_seg=cur_seg+l) 

for (cur_block=0; cur_Jblock<Ndata; cui_Mocb*cur_block+l) 
for (xfer-0; xfei<seg__$ize; xfer-xfer+1) 
host_addr-host_start+6tripe * (block_jsize *ndata) 

+cur__block*block_size 

+ctu seg*seg size 

+xfer, 

dest_addr-host_addr-host_start+dest_start; 
{Move data host_addr to dest_addr and compute parity during 
transfer}; 
next 
next 
next 
next 



where the associated registers represent: 

seg_size: The number of bytes comprising each segment, 

which is nominally the same size as the XOR buffer; 
block_size: The number of bytes written to each disk drive; 
ndata: The number of data drives; 
transfer size: Total number of bytes to be transferred; 
num_seg: The number of segments comprising a block 

(block size/segment size); 
cur_block: Present block being accessed; 
cur_seg: Current segment; 

transfer counter: Present number of bytes transferred; 
stripe__total: Number of stripes to be transferred (transfer 

size/block size* ndata); 
cur_stripe: Current stripe. 

The segmentation process may also be used for non- 
contiguous host data, as described by a scatter/gather list. In 
this case, the transfer address, that is the address to transfer 
the data segment is expressed as: 
transfer address=(current segment * segment size)+ 
(current block * block size)+(current stripe * stripe 
size)+portion of current segment transferred. 
The transfer address is compared against the 'floor' and 
'ceiling' of the current scatter/gather element where: 
floor=sum of the sizes of all previous scatter/gather ele- 
ments. 

ceiling=floor+size of current scatter/gather elements. 
If the transfer address lies within the current scatter/gather 
clement, the floor is subtracted from the transfer count (as 
previously defined) and the result is added to the base 
physical address of the current scatter/gather element, that 
is, the current segment. If the transfer address is greater than 
the ceiling of the current scatter/gather element, the seg- 
mentation process advances to the next scatter/gather ele- 
ment and the transfer address is compared against the floor 
and ceiling of the current scatter/gather element. If the 
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transfer address is less than the floor of the current scatter/ 
gather element, the segmentation process returns to the 
previous scatter/gather element and the transfer address is 
compared against the floor and the ceiling of the current 
scatter/gather element. 5 

When a disk failure occurs and a drive is not operating, 
that is the disk array is operating in degraded mode, the 
present invention is operable to reconstruct data from the 
failed drive. In a first embodiment, after responding to a read 
or write request, the disk interface controller 90 transfers 10 
data from the operating disk drives in the disk array 100 into 
local memory 80. The XOR buffer 72 is reset to all zeros and 
the address range is set by writing control registers with 
addresses that correspond to the host system address (which 
is the destination for the data), or an internal address which 15 
is later translated to the host address. The DMA controller 71 
transfers the data in 512 byte segments from local memory 
to the host 20. As the data is transferred, the XOR buffer 72 
"snoops" the data segments on the internal bus 51 corre- 
sponding to the appropriate stripe the lost data was located 20 
on. Each data transfer has a destination address in host 
memory, which corresponds to a portion of the addressing as 
represented in FIG. 3. The XOR buffer 72 simultaneously 
computes an intermediate parity of the data segments as 
each data segment is sent to the host memory 20. After all 25 
the appropriate data segments corresponding to the stripe are 
transferred to the host memory 20, the DMA controller 71 
transfers the parity corresponding to the stripe from local 
memory and the XOR buffer simultaneously computes the 
XOR of the intermediate parity and the parity corresponding 30 
to the stripe. The result is the reconstructed data and the 
DMA controller 71 transfers the reconstructed data to the 
host system memory 20. The reconstructed data may be 
transferred 'real time* to the host system as parity s trans- 
ferred to the XOR buffer, or the contents of the XOR buffer 35 
may be transferred to local memory 80 to support later 
transfer to a newly installed or hot spare disk. For non- 
buffered applications, such as video streaming, a larger XOR 
buffer may be desirable for the simultaneous generation of 
data of the non-functioning drive and transfer of data to the 40 
host. 

In the second embodiment, after responding to a read or 
write request, the disk interface controller 90 transfers data 
from the operating disk drives in the disk array 100 into the 
host system memory 20. The XOR buffer 72 is reset to all 45 
zeros and the XOR address range is set by writing the 
control registers with an address corresponding to the host 
system address (the destination of the data), or an internal 
address which is later translated to the host address. The 
CPU instructs the disk drive controller 90 to transfer a first 50 
set of data from operation drives 100 to the host system 
memory 20, each transfer being less than or equal to the 
XOR buffer 72 size. As the data is transferred, the XOR 
buffer 72 "snoops" the data segments on the internal bus 51 
and simultaneously computes an intermediate parity of the 55 
data segments as each data segment is sent to the host 
memory 20. Each data transfer has a destination address in 
host memory 20 which corresponds to a portion of the 
addressing as represented in FIG. 3. After all the appropriate 
data segments corresponding to the stripe are transferred to 60 
the host memory 20, the CPU instructs the disk drive 
controller 90 to transfer the parity corresponding to the 
stripe to the internal XOR address. The XOR buffer 72 
simultaneously computes the XOR of the intermediate parity 
and the parity corresponding to the stripe. The result is the 65 
reconstructed data and the DMA controller 71 transfers the 
reconstructed data to the host system memory 20. 



While the invention has been illustrated and described in 
detail in the drawings and foregoing description, such illus- 
tration and description is to be considered as exemplary and 
not restrictive in character, it being understood that only the 
preferred embodiment and minor variants thereof have been 
shown and described and that all changes and modifications 
that come within the spirit of the invention are desired to be 
protected. 

What is claimed is: 

1. In a DMA controller of a RAID storage controller in a 
RAID storage subsystem, a method to compute parity com- 
prising the steps of: 

transferring a data portion of a stripe from a host system 
to a RAID storage subsystem through said DMA con- 
troller from a random access source memory to a 
destination memory, 

wherein said data portion of said stripe is comprised of a 
plurality of data blocks 

wherein each of said plurality of data blocks is comprised 
of a plurality of data segments, and 

wherein said plurality of data segments are stored in said 
source memory in a first predetermined order and 
wherein said data segments are transferred to said 
destination memory in a second predetermined order 
different from said first predetermined order; and 

computing, within said DMA controller, a plurality of 
XOR parity segments corresponding to said plurality of 
data segments, 

wherein the step of transferring a data portion of a stripe 
further comprises the steps of: 

a) transferring a first data segment of said plurality of 
data segments of a data block of said plurality of data 
blocks of said stripe; 

b) transferring a corresponding data segment of said 
plurality of data segments of a next data block of said 
plurality of data blocks; 

c) storing XOR parity generated by said computing step 
in a buffer; 

d) repeating steps b) and c) for each data block of said 
stripe until each corresponding data segment of each 
block of said plurality of data blocks of said stripe 
have been transferred; and 

e) repeating steps a) through d) for a next data segment 
of said plurality of data segments in each of said 
plurality of blocks in said data portion of said stripe 
until all of said plurality of data segments have been 
transferred, and 

wherein the computing step is performed substantially 
simultaneously with said step of transferring. 

2. The method of claim 1 further comprising the step of: 
storing each of said plurality of XOR parity segments in 

said destination memory. 

3. The method of claim 1 wherein said buffer is a FIFO for 
XOR parity accumulation and wherein the step of transfer- 
ring said plurality of data segments further comprises the 
steps of: 

a) resetting said FIFO to all zeros; 

b) transferring a data segment of said plurality of data 
segments of a data block of said plurality of data 
blocks; 

c) storing XOR parity generated by said computing step 
in said FIFO; 

d) repeating steps b) and c) for each data block of said 
plurality of data blocks of said stripe until each corre- 
sponding data segment of each data block of said 
plurality of data blocks of said stripe have been trans- 
ferred; 
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e) repeating steps a) through d) for a next data segment of 
said plurality of data segments in each of said plurality 
of blocks in said data portion of said stripe until all of 
said plurality of data segments have been transferred. 

4. The method of claim 1 wherein said buffer is a FIFO for 5 
XOR parity accumulation and wherein approximately equal 

in size to the size of one of said plurality of data segments 
and wherein said step of transferring a plurality of data 
segments includes the step of: 

reading said data segments from said source memory in a 10 
predetermined order operable to compute XOR parity 
of corresponding segments of said data portion of said 
stripe substantially in parallel with the transfer of said 
plurality of data segments. 

5. The method of claim 1 further comprising a storage 15 
subsystem including a plurality of disk drives and is oper- 
ating in a degraded mode having at least one non-functional 
disk drive with corresponding missing data segments from 
said plurality of data segments, and 

wherein the step of transferring includes the steps of: 20 
transferring said plurality of data segments, wherein 
said transfer is devoid of said missing data segments; 
and 

transferring previously computed associated parity 
segments, and 25 

wherein the step of computing a plurality of XOR 
parity segments comprises the step of: 
computing a plurality of XOR parity segments rep- 
resentative of said missing data segments. 

6. A DMA controller for computing XOR parity in a 30 
RAID storage subsystem comprising: 

DMA transfer means for transferring a data portion of a 
stripe from a random access source memory to a 
destination memory in a predetermined order different ^ 
than the order in which said stripe is stored in said 
source memory; 

a FIFO for storing parity values generated in said DMA 
controller; 

XOR generation means coupled to said FIFO; and 40 
XOR buffer circuit coupled to said DMA transfer means 
for capturing said stripe as it is transferred by said 
DMA transfer means wherein said circuit is operable to 
control said FIFO and said XOR generation means in 
response to transfer of said stripe by said DMA transfer 45 
means, 

wherein said stripe is comprised of a plurality of blocks 
and wherein each block is comprised of a plurality of 
segments, and 

wherein said DMA transfer means further comprises: 50 

means for transferring a first data segment of said 
plurality of data segments of a data block of said 
plurality of data blocks of said stripe; 

means for transferring a corresponding data segment of 
said plurality of data segments of a next data block 55 
of said plurality of data blocks; 

means for storing XOR parity generated by said XOR 
buffer circuit; 

first means for repeating operation of said means for 
transferring a corresponding data segment and 60 
operation of said means for storing for each data 
block of said stripe until each corresponding data 
segment of each block of said plurality of data blocks 
of said stripe have been transferred; and 

first means for repeating operation of said means for 65 
transferring a first data segment and operation of said 
means for transferring a corresponding data segment 
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and operation of said means for storing and operation 
of said means for repeating for a next data segment 
of said plurality of data segments in each of said 
plurality of blocks in said data portion of said stripe 
until all of said plurality of data segments have been 
transferred. 

7. The DMA controller of claim 6 wherein said XOR 
buffer circuit includes: 

first means for controlling said memory to store data 
captured during transfer of said plurality of data seg- 
ments in said memory; 

second means for controlling said memory and said XOR 
generation means to compute the bitwise XOR of data 
previously stored in said memory and data captured 
during transfer of said plurality of data segments and to 
store said bitwise XOR result in said memory; and 

third means for controlling said memory to read the 
contents of said memory for purposes of transferring 
parity data from said memory to said destination 
memory. 

8. A DMA controller for use in a RAID storage controller 
of a RAID storage subsystem, said DMA controller com- 
prising: 

means for transferring a data portion of a stripe through 
said DMA controller from a random access source 
memory to a destination memory, 

wherein said stripe is stored in a predetermined first order 
in said source memory and wherein said stripe is 
transferred to said destination memory in a predeter- 
mined second order different from said predetermined 
first order; and 

means for computing, within said DMA controller, a 
plurality of XOR parity segments corresponding to said 
stripe, 

wherein said means for computing and said means for 

transferring are operable substantially simultaneously, 
wherein said stripe is comprised of a plurality of blocks 

and wherein each block is comprised of a plurality of 

segments, and 
wherein said means for transferring further comprises: 

means for transferring a first data segment of said 
plurality of data segments of a data block of said 
plurality of data blocks of said stripe; 

means for transferring a corresponding data segment of 
said plurality of data segments of a next data block 
of said plurality of data blocks; 

buffer means for storing XOR parity generated by said 
means for computing; 

first means for repeating operation of said means for 
transferring a corresponding data segment and 
operation of said means for storing for each data 
block of said stripe until each corresponding data 
segment of each block of said plurality of data blocks 
of said stripe have been transferred; and 

first means for repeating operation of said means for 
transferring a first data segment and operation of said 
means for transferring a corresponding data segment 
and operation of said means for storing and operation 
of said means for repeating for a next data segment 
of said plurality of data segments in each of said 
plurality of blocks in said data portion of said stripe 
until all of said plurality of data segments have been 
transferred. 

9. The DMA controller of claim 8 further comprising: 
means for storing each of said plurality of XOR parity 

segments in said destination memory. 
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10. The DMA controller of claim 9 wherein said buffer 
means includes a FIFO for XOR parity accumulation, said 
FIFO being approximately equal in size to the size of one of 
said plurality of data segments. 

11. The DMA controller of claim 8 further comprising a 5 
storage subsystem including a plurality of disk drives and is 
operating in a degraded mode having at least one non- 
functional disk drive with corresponding missing data seg- 
ments from said plurality of data segments, and 

wherein said means for transferring includes: 10 
means for transferring said plurality of data segments, 

wherein said means for transfer is devoid of said 

missing data segments; and 
means for transferring previously computed associated 

parity segments, and 15 
wherein said means for computing a plurality of XOR 

parity segments includes: 

means for computing a plurality of XOR parity 
segments representative of said missing data seg- 
ments. 20 

12. A system within a RAID storage system controller for 
performing transfer of a RAID stripe substantially simulta- 
neous with computation of error detection and correction 
information, said system comprising: 

a DMA transfer controller for transferring said RAID 25 
stripe stored in a first predetermined order a random 
access source memory to a destination memory in a 
second predetermined order; and 

an error detection and correction computation element 3Q 
coupled to said DMA transfer controller for computing 
said error detection and correction information substan- 
tially simultaneously with the transfer of said RAID 
stripe by said DMA transfer controller, 

wherein said RAID stripe is comprised of a plurality of 35 
data blocks 

wherein each of said plurality of data blocks is comprised 

of a plurality of data segments, and 
wherein said DMA transfer controller is controllably 

operable to: 40 

a) transfer a first data segment of said plurality of data 
segments of a data block of said plurality of data 
blocks of said stripe; 

b) transfer a corresponding data segment of said plu- 
rality of data segments of a next data block of said 45 
plurality of data blocks; 
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c) store XOR parity generated by said computing step 
in a buffer; 

d) repeat steps b) and c) for each data block of said 
stripe until each corresponding data segment of each 
block of said plurality of data blocks of said stripe 
have been transferred; and 

e) repeat steps a) through d) for a next data segment of 
said plurality of data segments in each of said 
plurality of blocks in said data portion of said stripe 
until all of said plurality of data segments have been 
transferred. 

13. The system of claim 12 wherein said error detection 
and correction computation element is operable in a plurality 
of modes and includes: 

at least one programmable address range register to 
controllably select an operating mode from said plu- 
rality of modes. 

14. The system of claim 13 wherein said buffer includes: 
a FIFO buffer and where said error detection and correc- 
tion computation element is controllably operable in a 
mode wherein data transferred by said DMA transfer 
controller is stored in said FIFO buffer for purposes of 
initializing said error detection and correction informa- 
tion. 

15. The system of claim 13 wherein said buffer includes: 
a FIFO buffer and where said error detection and correc- 
tion computation element is controllably operable in a 
mode wherein data transferred by said DMA transfer 
controller is XOR'd with corresponding error detection 
and correction information presently stored in said 
FIFO buffer for purposes of updating said error detec- 
tion and correction information. 

16. The system of claim 13 wherein said buffer includes: 
a FIFO buffer and where said error detection and correc- 
tion computation element is controllably operable in a 
mode wherein said source memory comprises said 
FIFO buffer and wherein said data transferred by said 
DMA transfer controller comprises said error detection 
and correction information in said FIFO buffer and 
wherein said error detection and correction information 
is transferred by said DMA transfer controller to said 
destination memory. 

* * * * * 
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